Overview:
- There are several container mechanisms available in the Python ecosystem for storing and manipulating multiple elements. These mechanisms store elements from basic types like bool to class objects in the main memory of a computer system. They include arrays - as provided by the Python standard library implementation, ndarrays of the numpy library, specialized containers that are built on top of ndarrays like Series and DataFrame classes of the pandas library.
- During the lifetime of a Python program, these container instances can grow significantly to the extent that the available main memory becomes lesser and lesser and the scenario imposes a performance bottleneck.
- Based on the sheer size of the data under consideration methods like cluster computing can horizontally scale the scope of the Python programs.
- Regardless of whether Python program(s) run(s) in a computing cluster or in a single system only, it is essential to measure the amount of memory consumed by the major data structures like a pandas DataFrame.
- With the method memory_usage() of the DataFrame class the column-wise memory consumption of a DataFrame instance can be calculated.
Example:
# Example Python program that computes the memory # Read a CSV file downloaded from kaggle print("Memory consumption of each DataFrame column in bytes:"); |
Output:
Memory consumption of each DataFrame column in bytes: Index 128 id 1471128 title 1471128 score 1471128 author 1471128 author_flair_text 1471128 removed_by 1471128 total_awards_received 1471128 awarders 1471128 created_utc 1471128 full_link 1471128 num_comments 1471128 over_18 183891 dtype: int64 Memory consumption of the DataFrame instance in bytes:16366427 bytes Memory consumption in megabytes(MB): 15.61 MB |